Treating Dictionaries as a Linked-Data Corpus
نویسندگان
چکیده
In this paper we describe a practical approach to the challenge of linguistic retrodigitization. We propose to distinguish strictly between a base digitization and separate interpretation of the sources. The base digitization only includes a literal electronic transcript of the source. All sources are thus simply treated as strings of characters, i.e. as unstructured corpora. The often complex structure as found in many dictionaries and grammars will subsequently (and possibly much later) be added as Linked Data in the form of standoff annotation. A further advantage of this approach is that the complete digitization and interpretation can be performed collaboratively without a complex organizational superstructure.
منابع مشابه
Multilingual linked data
The interaction of natural language processing and the Semantic Web have lead to the creation of a new paradigm known as Linguistic Linked Open Data (LLOD), whereby traditional language resources are made available as linked data. Conversely, the publication of corpora, machine-readable dictionaries as linked data has opened new resources to Semantic Web researchers and allowed new tools to be ...
متن کاملX-Linked Lissencephaly with Absent Corpus Callosum and Ambiguous Genitalia: A Case Report
Background: X-linked lissencephaly with ambiguous genitalia (XLAG) is a recently described genetic disorder, in which patients present with lissencephaly, agenesis of the corpus callosum, refractory epilepsy of neonatal onset, acquired microcephaly, and male genotype with ambiguous genitalia. XLAG is responsible for a severe neurological disorder of neonatal onset in boys. A gyration defect con...
متن کاملOptimized Selection of Intonation Dictionaries in Corpus Based Intonation Modelling
Data scarcity in corpus-based intonation modelling for TTS applications is addressed. We propose to apply a searching process to a list of dictionaries of classes of intonation patterns previously trained from corpus to avoid problems associated with the scarce number of samples in the classes. Results indicate that better results are obtained in comparison with previous alternatives where the ...
متن کاملColor Dictionaries and Corpora
In the study of linguistics, a corpus is a data set of naturally occurring language (speech or writing) that can be used to generate or test linguistic hypotheses. The study of color naming worldwide has been carried out using three types of data sets: (1) corpora of empirical color-naming data collected from native speakers of many languages; (2) scholarly data sets where the color terms are o...
متن کاملMachine-Readable Dictionaries in Text-to-Speech Systems
This paper presents the results of an experiment usiug machine-readable dictionaries (Mill)s) and corpora for building concatenativc units for text to speech (T'PS) systems. Theoretical questions concerning the nature of t)honemic data in dictionaries are raised; phonemic dictionary data is viewed as a representative corpus over which to extract n-gram phonemic frequencies in the language. Dict...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012